Skip to content

feat(core): add opt-in periodic ping for connection health monitoring#1717

Open
travisbreaks wants to merge 4 commits intomodelcontextprotocol:mainfrom
travisbreaks:feat/periodic-ping
Open

feat(core): add opt-in periodic ping for connection health monitoring#1717
travisbreaks wants to merge 4 commits intomodelcontextprotocol:mainfrom
travisbreaks:feat/periodic-ping

Conversation

@travisbreaks
Copy link
Copy Markdown

Summary

  • Adds pingIntervalMs option to ProtocolOptions that enables automatic periodic pings to verify the remote side is still responsive
  • Implemented at the Protocol level so both Client and Server benefit
  • Client starts periodic pings after successful initialization; Server starts after receiving the initialized notification
  • Disabled by default (opt-in). Ping failures are reported via onerror without stopping the timer
  • Timer uses unref() so it does not prevent clean Node.js process exit

Per the MCP specification: "Implementations SHOULD periodically issue pings to detect connection health" with "configurable frequency."

Usage

const client = new Client(
    { name: 'my-client', version: '1.0.0' },
    { pingIntervalMs: 30_000 } // ping every 30 seconds
);

client.onerror = (error) => {
    console.warn('Connection issue:', error.message);
};

await client.connect(transport);
// Periodic pings start automatically after initialization

Key design decisions

  • Protocol-level implementation: both Client and Server can use periodic pings, since the spec says "implementations" (not just clients)
  • Opt-in, not opt-out: existing behavior is unchanged; you must explicitly set pingIntervalMs to enable
  • Ping timeout equals the interval: if a ping takes longer than one interval to respond, it fails with a timeout error rather than stacking up
  • unref() on the timer: the periodic ping does not keep the Node.js process alive if all other work is done

Relationship to other PRs

This complements PR #1712 (host process watchdog for stdio orphan detection). That PR detects when the host process is gone at the OS level; this PR detects when the remote MCP peer is unresponsive at the protocol level.

Fixes #1000

Test plan

  • should not send periodic pings when pingIntervalMs is not set (default behavior unchanged)
  • should send periodic pings when pingIntervalMs is set and startPeriodicPing is called
  • should stop periodic pings on close
  • should report ping errors via onerror without stopping the timer
  • should not start duplicate timers if startPeriodicPing is called multiple times
  • should stop periodic pings when transport closes unexpectedly
  • All 446 core tests pass
  • All 386 integration tests pass
  • Build succeeds
  • Lint passes

Changeset

@modelcontextprotocol/core: minor, @modelcontextprotocol/client: minor, @modelcontextprotocol/server: minor

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@travisbreaks travisbreaks requested a review from a team as a code owner March 19, 2026 18:44
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 19, 2026

🦋 Changeset detected

Latest commit: 5116b16

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@modelcontextprotocol/core Minor
@modelcontextprotocol/client Minor
@modelcontextprotocol/server Minor
@modelcontextprotocol/node Major
@modelcontextprotocol/express Major
@modelcontextprotocol/fastify Major
@modelcontextprotocol/hono Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 19, 2026

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/@modelcontextprotocol/client@1717

@modelcontextprotocol/server

npm i https://pkg.pr.new/@modelcontextprotocol/server@1717

@modelcontextprotocol/express

npm i https://pkg.pr.new/@modelcontextprotocol/express@1717

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/@modelcontextprotocol/fastify@1717

@modelcontextprotocol/hono

npm i https://pkg.pr.new/@modelcontextprotocol/hono@1717

@modelcontextprotocol/node

npm i https://pkg.pr.new/@modelcontextprotocol/node@1717

commit: 5116b16

@felixweinberger
Copy link
Copy Markdown
Contributor

@claude review

Implements the missing periodic ping functionality specified in issue modelcontextprotocol#1000.
Per the MCP specification, implementations SHOULD periodically issue pings
to detect connection health, with configurable frequency.

Changes:
- Add `pingIntervalMs` option to `ProtocolOptions` (disabled by default)
- Implement `startPeriodicPing()` and `stopPeriodicPing()` in Protocol
- Client starts periodic ping after successful initialization
- Server starts periodic ping after receiving initialized notification
- Timer uses `unref()` so it does not prevent clean process exit
- Ping failures are reported via `onerror` without stopping the timer
- Timer is automatically cleaned up on close or unexpected disconnect

Fixes modelcontextprotocol#1000

Co-authored-by: Br1an67 <932039080@qq.com>
Mirrors the Go SDK pattern where the initialized notification handler
is a named method rather than an inline closure.
@felixweinberger
Copy link
Copy Markdown
Contributor

@claude review

travisbreaks and others added 2 commits April 2, 2026 00:32
The reconnection path (when transport.sessionId is set) returns early,
skipping the startPeriodicPing() call that runs after normal
initialization. Since _onclose() clears the ping timer during
disconnection, connection health monitoring silently stops working
after any reconnect. Adding startPeriodicPing() before the early
return ensures pings resume on reconnection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@felixweinberger
Copy link
Copy Markdown
Contributor

@claude review

Comment on lines 801 to 804
async close(): Promise<void> {
this.stopPeriodicPing();
await this._transport?.close();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When close() is called while a periodic ping is in-flight, _onclose() rejects all pending response handlers with SdkError(ConnectionClosed), causing the ping catch block to fire this._onerror() with that error. Users monitoring onerror for connection health will receive spurious Connection closed errors during normal, intentional shutdowns that are indistinguishable from genuine connection failures. Fix: add a _closing boolean flag set before closing the transport, and check it in the ping catch block to suppress the error.

Extended reasoning...

The Race Condition

The bug lives in startPeriodicPing() at packages/core/src/shared/protocol.ts around lines 778-804. The setInterval callback is async, so once it fires and awaits _requestWithSchema(), JavaScript cannot cancel it. Clearing the interval only prevents future firings; the already-started async function continues running. This is the core of the race.

Exact Code Path

  1. The setInterval fires. The async callback begins executing and calls _requestWithSchema({ method: 'ping' }, ...), which registers a response handler in _responseHandlers and suspends at the await.
  2. Before the server responds, the caller invokes close().
  3. close() calls stopPeriodicPing(), which does clearInterval(this._pingTimer). The in-flight async callback has already started; clearing the interval only prevents future firings.
  4. close() then calls this._transport?.close(), which triggers the transport onclose callback.
  5. That callback invokes _onclose(), which iterates over all entries in _responseHandlers and rejects each one with new SdkError(SdkErrorCode.ConnectionClosed, 'Connection closed').
  6. The pending _requestWithSchema promise now rejects with the ConnectionClosed error.
  7. The catch block in startPeriodicPing executes: since SdkError extends Error, this._onerror(error) fires.
  8. The user's onerror callback receives a SdkError(ConnectionClosed) during a normal, user-initiated shutdown.

Why Existing Code Does Not Prevent It

The close() method does call stopPeriodicPing() before this._transport?.close(), which looks like it should protect against this. But stopPeriodicPing() only prevents new timer firings. It cannot retroactively cancel an async function that has already started and is awaiting a promise. There is no mechanism in the ping catch block to distinguish intentional close from a genuine connection failure.

Impact

Any consumer that sets onerror to monitor connection health (the primary intended use case per the PR description) will see false positive ConnectionClosed errors every time close() is called while a ping happens to be in-flight. Since pings fire on a regular interval, the race is probabilistic but reliably reproducible at scale or in tests. More critically, these errors are indistinguishable from a real remote-side failure, so automated reconnection logic or alerting built on onerror will misfire.

How to Fix

Add a private _closing boolean field to Protocol, initialized to false. In close(), set this._closing = true before calling stopPeriodicPing() and this._transport?.close(). In the ping catch block, guard the onerror call:

} catch (error) {
    if (!this._closing) {
        this._onerror(error instanceof Error ? error : new Error('Periodic ping failed: ' + String(error)));
    }
}

Reset _closing to false in _onclose() after cleanup so that a reconnected protocol instance works correctly. The _closing flag cleanly suppresses the error regardless of what exact error code the transport produces on shutdown.

Step-by-Step Proof

T=0   setInterval fires; pingCallback() begins executing
T=0   _requestWithSchema registers handler H in _responseHandlers, yields at await
T=1   close() is called by user
T=1   stopPeriodicPing() clears the timer -- in-flight pingCallback is unaffected
T=1   this._transport.close() called
T=1   transport triggers onclose -> _onclose() runs
T=1   _onclose(): for each handler in responseHandlers -> handler(SdkError(ConnectionClosed))
T=1   Handler H rejects; pingCallback resumes in catch block
T=1   catch(error): this._onerror(error) -- SPURIOUS ERROR DELIVERED TO USER
T=1   user onerror fires with 'Connection closed' during intentional shutdown

Comment on lines +772 to +780
this._pingTimer = setInterval(async () => {
try {
await this._requestWithSchema({ method: 'ping' }, EmptyResultSchema, {
timeout: this._pingIntervalMs
});
} catch (error) {
this._onerror(error instanceof Error ? error : new Error(`Periodic ping failed: ${String(error)}`));
}
}, this._pingIntervalMs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The setInterval-based ping loop can briefly have 2 concurrent pings in-flight simultaneously. Since the ping timeout is also set to _pingIntervalMs, the timeout for ping N and the start of ping N+1 are both scheduled for the same wall-clock time — their ordering in the JS event loop is non-deterministic. Using setTimeout with rescheduling after each ping completes would strictly enforce the documented one-at-a-time behavior.

Extended reasoning...

What the bug is

In startPeriodicPing() (protocol.ts:772-780), setInterval fires every _pingIntervalMs regardless of whether the previous async callback has finished. The ping timeout passed to _requestWithSchema is also this._pingIntervalMs. This means the internal setTimeout that will reject ping N with a timeout error and the next setInterval tick that fires ping N+1 are both scheduled for the same wall-clock time (T + pingIntervalMs from ping N start). Which fires first is non-deterministic: it depends on the JS engine timer queue ordering, which is not specified.

The specific code path

The relevant code:

this._pingTimer = setInterval(async () => {
    try {
        await this._requestWithSchema({ method: 'ping' }, EmptyResultSchema, {
            timeout: this._pingIntervalMs  // same value as the interval
        });
    } catch (error) {
        this._onerror(...);
    }
}, this._pingIntervalMs);

Because setInterval does not await the async callback, it fires the next tick at T + pingIntervalMs unconditionally. The request inside the callback has its own setTimeout-based timeout also set to pingIntervalMs. Both macrotasks are enqueued for approximately the same wall-clock time.

Why existing code does not prevent it

The idempotency guard only prevents duplicate timers from multiple startPeriodicPing() calls. It does nothing to serialize individual ping invocations. Once the interval is running, each tick creates a new promise/request that executes concurrently with any still-in-flight ping.

Addressing the refutation and impact

The refutation correctly notes that the overlap is bounded: because the timeout equals the interval, each lingering ping will be canceled at approximately the same time the next one starts. The worst case is 2 concurrent pings, not N. In practice, on a well-functioning connection, pings respond well within the interval and there is no overlap. Only when a ping is slow (response time approaching pingIntervalMs) does the race condition manifest, and even then it is transient and bounded.

However, the PR description explicitly claims: "Ping timeout equals the interval: if a ping takes longer than one interval to respond, it fails with a timeout error rather than stacking up." This is imprecise. Due to timer imprecision, the new ping can start before the old one is cleaned up, meaning they do briefly stack up (2 at once). The stated guarantee is not actually enforced by the implementation.

Step-by-step proof

  1. T=0: setInterval fires, ping Better validation of data coming over the wire #1 is sent. A setTimeout(reject, pingIntervalMs) is registered inside _requestWithSchema.
  2. T=pingIntervalMs: Both the interval tick (ping Race condition between transport start and server connection #2) and the timeout for ping Better validation of data coming over the wire #1 are queued as macrotasks. Two timers set for the same target time have implementation-defined ordering.
  3. If the interval fires first: ping Race condition between transport start and server connection #2 is sent and added to _responseHandlers while ping Better validation of data coming over the wire #1's handler is still present. Both are in-flight simultaneously for a brief moment before ping Better validation of data coming over the wire #1's timeout fires.
  4. If the timeout fires first: ping Better validation of data coming over the wire #1 is cleaned up, then ping Race condition between transport start and server connection #2 is sent — matching the documented behavior.

How to fix it

Replace setInterval with a self-rescheduling setTimeout pattern that only schedules the next ping after the current one completes (success or failure). This strictly serializes pings and matches the documented one-at-a-time intent, at the cost of the interval becoming pingIntervalMs + RTT rather than exactly pingIntervalMs — an acceptable trade-off for health monitoring.

Comment on lines +753 to +766
/**
* Starts sending periodic ping requests at the configured interval.
* Pings are used to verify that the remote side is still responsive.
* Failures are reported via the {@linkcode onerror} callback but do not
* stop the timer; pings continue until the connection is closed.
*
* This is called automatically at the end of {@linkcode connect} when
* `pingIntervalMs` is set. Subclasses that override `connect()` and
* perform additional initialization (e.g., the MCP handshake) may call
* this method after their initialization is complete instead.
*
* Has no effect if periodic ping is already running or if no interval
* is configured.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The JSDoc for startPeriodicPing() claims "This is called automatically at the end of connect() when pingIntervalMs is set," but Protocol.connect() ends with await this._transport.start() and never calls startPeriodicPing(). Any custom Protocol subclass that does not override connect() will silently never start periodic pings — update the JSDoc to accurately describe that only Client and Server call this after their respective initialization steps.

Extended reasoning...

What the bug is

The JSDoc comment on startPeriodicPing() (protocol.ts lines 759-762) contains a false statement: "This is called automatically at the end of connect() when pingIntervalMs is set." This implies the base Protocol.connect() is responsible for starting the ping timer when the option is configured.

The specific code path

Looking at Protocol.connect(), the method sets up transport callbacks and ends with await this._transport.start(). There is no call to startPeriodicPing() anywhere in the base implementation. The actual callers are: (1) Client.connect() calls this.startPeriodicPing() after the full MCP initialization handshake completes (and also on the reconnection fast-path); (2) Server._handleInitialized() calls this.startPeriodicPing() when the notifications/initialized notification is received.

Why existing code does not prevent the confusion

The second sentence of the JSDoc partially acknowledges the real behavior: "Subclasses that override connect() and perform additional initialization may call this method after their initialization is complete instead." However, this reads as an opt-out from the stated automatic behavior, not a correction of it. The false first sentence still stands.

Impact

Any developer implementing a custom Protocol subclass who passes pingIntervalMs and does not override connect() will get no periodic pings, despite the documentation promising automatic invocation. The ping timer simply never starts. This is a silent failure — no error is thrown, and _pingIntervalMs is set, so there is no obvious indication that something is wrong.

Step-by-step proof

  1. Create a class extending Protocol without overriding connect()
  2. Instantiate with new MyProtocol({ pingIntervalMs: 30_000 })
  3. Call await myProtocol.connect(transport) — this runs Protocol.connect(), which calls transport.start() and returns
  4. startPeriodicPing() is never invoked; _pingTimer remains undefined
  5. No pings are ever sent, despite _pingIntervalMs being set to 30,000

Fix

Update the JSDoc to accurately state that Client and Server are the actual callers, and that custom subclasses must call this method explicitly after their own initialization is complete.

Regarding the duplicate refutation

One verifier flagged this as a duplicate of bug_004. Whether or not another report covers the same issue, the bug is independently confirmed and real. The JSDoc is factually wrong regardless of duplication — the duplication concern is a process question, not a technical refutation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report-2: Missing Periodic Ping Implementation

2 participants